%abstract

For any given application, there is an optimal throughput point in the
space of per-processor performance and the number of such processors
given to that application.  However, due to thermal, yield, and other
constraints, not all of these optimal points can plausibly be
constructed with a given technology. In this paper, we look at how
emerging steep slope devices, 3D circuit integration, and trends in
process technology scaling will combine to shift the boundaries of
both attainable performance, and the optimal set of technologies to
employ to achieve it.  We propose a heterogeneous-technology 3D
architecture capable of operating efficiently at an expanded number of
points in this larger design space and devise a heterogeneity and
thermal aware scheduling algorithm to exploit its potential. Our heterogeneous 
mapping techniques are capable of producing speedups ranging from 17\% for a high end server workloads running at around 90$^\circ$C to over 
160\% for embedded systems running below 60$^\circ$C.


% LocalWords:  3D
